Object hallucination

Large vision-language models (LVLMs) tends to hallucinate nonexistent objects in the image, maybe because of strong language prior and spurious co-occurrence in the training data.

Types

Questions

Evaluation

Rohrbach2018object proposed a simple metric CHAIR metric to quantify the object hallucination. It is simply the ratio between the number of hallucinated objects (sentences) and all objects mentioned (all sentences).

Li2023evaluating proposed POPE (Polling based Object Probing Evaluation). The idea is asking yes-or-no questions about objects in the scene, where nonexistent objects are sampled by three strategies: random, popular, and adversarial. Popular sampling samples from top-k in the data and adversarial sampling samples top-k from a list sorted by the co-occurrence with the ground-truth objects.

Mitigation

Liu2024reducing studies both why hallucination in LVLMs arise and how to mitigate them.